这里介绍了人工智能研究所(IARAI)组织的2022年Landslide4sense(L4S)竞赛的科学结果。竞争的目的是根据全球收集的卫星图像的大规模多个来源自动检测滑坡。 2022 L4S旨在促进有关使用卫星图像的语义分割任务的深度学习模型(DL)模型最新发展的跨学科研究。在过去的几年中,由于卷积神经网络(CNN)的发展,基于DL的模型已经达到了对图像解释的期望。本文的主要目的是介绍本次比赛中介绍的细节和表现最佳的算法。获胜的解决方案详细介绍了Swin Transformer,Segformer和U-NET等最先进的模型。还考虑了先进的机器学习技术和诸如硬采矿,自我培训和混合数据增强之类的策略。此外,我们描述了L4S基准数据集,以促进进一步的比较,并在线报告准确性评估的结果。可以在\ textIt {未来开发排行榜上访问数据,以供将来评估,\ url {https://www.iarai.ac.ac.at/landslide4sense/challenge/},并邀请研究人员提交更多预测结果,评估准确性在他们的方法中,将它们与其他用户的方法进行比较,理想情况下,改善了本文报告的滑坡检测结果。
translated by 谷歌翻译
创伤性脑损伤(TBI)患者的脑网络分析对于其意识水平评估和预后评估至关重要,这需要分割某些意识相关的大脑区域。但是,由于很难收集TBI患者的手动注释的MR扫描,因此很难构建TBI分割模型。数据增强技术可用于缓解数据稀缺问题。但是,常规数据增强策略(例如空间和强度转化)无法模仿创伤性大脑中的变形和病变,这限制了后续分割任务的性能。为了解决这些问题,我们提出了一种名为TBIGA的新型医学图像授课模型,以通过配对的脑标签图合成TBI MR扫描。我们的TBIGAN方法的主要优势在于,它可以同时生成TBI图像和相应的标签映射,这在以前的医学图像的先前涂上方法中尚未实现。我们首先按照粗到细节的方式在边缘信息的指导下生成成分的图像,然后将合成强度图像用作标签上填充的先验。此外,我们引入了基于注册的模板增强管道,以增加合成图像对的多样性并增强数据增强能力。实验结果表明,提出的TBIGAN方法可以产生具有高质量和有效标签图的足够合成的TBI图像,这可以大大改善与替代方案相比的2D和3D创伤性脑部分割性能。
translated by 谷歌翻译
具有大量空间和时间跨境的情景中的人重新识别(RE-ID)尚未完全探索。这部分原因是,现有的基准数据集主要由有限的空间和时间范围收集,例如,使用在校园特定区域的相机录制的视频中使用的视频。这种有限的空间和时间范围使得难以模拟真实情景中的人的困难。在这项工作中,我们贡献了一个新的大型时空上次最后一个数据集,包括10,862个图像,具有超过228k的图像。与现有数据集相比,最后一个具有挑战性和高度多样性的重新ID设置,以及显着更大的空间和时间范围。例如,每个人都可以出现在不同的城市或国家,以及在白天到夜间的各个时隙,以及春季到冬季的不同季节。为了我们的最佳知识,最后是一个新的Perse Re-ID数据集,具有最大的时空范围。基于最后,我们通过对14个RE-ID算法进行全面的绩效评估来验证其挑战。我们进一步提出了一种易于实施的基线,适用于如此挑战的重新ID设置。我们还验证了初步训练的模型可以在具有短期和更改方案的现有数据集中概括。我们期待持续激发未来的工程,以更现实和挑战的重新识别任务。有关DataSet的更多信息,请访问https://github.com/shuxjweb/last.git。
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
Deep learning classifiers provide the most accurate means of automatically diagnosing diabetic retinopathy (DR) based on optical coherence tomography (OCT) and its angiography (OCTA). The power of these models is attributable in part to the inclusion of hidden layers that provide the complexity required to achieve a desired task. However, hidden layers also render algorithm outputs difficult to interpret. Here we introduce a novel biomarker activation map (BAM) framework based on generative adversarial learning that allows clinicians to verify and understand classifiers decision-making. A data set including 456 macular scans were graded as non-referable or referable DR based on current clinical standards. A DR classifier that was used to evaluate our BAM was first trained based on this data set. The BAM generation framework was designed by combing two U-shaped generators to provide meaningful interpretability to this classifier. The main generator was trained to take referable scans as input and produce an output that would be classified by the classifier as non-referable. The BAM is then constructed as the difference image between the output and input of the main generator. To ensure that the BAM only highlights classifier-utilized biomarkers an assistant generator was trained to do the opposite, producing scans that would be classified as referable by the classifier from non-referable scans. The generated BAMs highlighted known pathologic features including nonperfusion area and retinal fluid. A fully interpretable classifier based on these highlights could help clinicians better utilize and verify automated DR diagnosis.
translated by 谷歌翻译
Safety-critical Autonomous Systems require trustworthy and transparent decision-making process to be deployable in the real world. The advancement of Machine Learning introduces high performance but largely through black-box algorithms. We focus the discussion of explainability specifically with Autonomous Vehicles (AVs). As a safety-critical system, AVs provide the unique opportunity to utilize cutting-edge Machine Learning techniques while requiring transparency in decision making. Interpretability in every action the AV takes becomes crucial in post-hoc analysis where blame assignment might be necessary. In this paper, we provide positioning on how researchers could consider incorporating explainability and interpretability into design and optimization of separate Autonomous Vehicle modules including Perception, Planning, and Control.
translated by 谷歌翻译
Graphic sketch representations are effective for representing sketches. Existing methods take the patches cropped from sketches as the graph nodes, and construct the edges based on sketch's drawing order or Euclidean distances on the canvas. However, the drawing order of a sketch may not be unique, while the patches from semantically related parts of a sketch may be far away from each other on the canvas. In this paper, we propose an order-invariant, semantics-aware method for graphic sketch representations. The cropped sketch patches are linked according to their global semantics or local geometric shapes, namely the synonymous proximity, by computing the cosine similarity between the captured patch embeddings. Such constructed edges are learnable to adapt to the variation of sketch drawings, which enable the message passing among synonymous patches. Aggregating the messages from synonymous patches by graph convolutional networks plays a role of denoising, which is beneficial to produce robust patch embeddings and accurate sketch representations. Furthermore, we enforce a clustering constraint over the embeddings jointly with the network learning. The synonymous patches are self-organized as compact clusters, and their embeddings are guided to move towards their assigned cluster centroids. It raises the accuracy of the computed synonymous proximity. Experimental results show that our method significantly improves the performance on both controllable sketch synthesis and sketch healing.
translated by 谷歌翻译
Unsupervised domain adaptation (UDA) has been highly successful in transferring knowledge acquired from a label-rich source domain to a label-scarce target domain. Open-set domain adaptation (ODA) and universal domain adaptation (UNDA) have been proposed as solutions to the problem concerning the presence of additional novel categories in the target domain. Existing ODA and UNDA approaches treat all novel categories as one unified unknown class and attempt to detect this unknown class during the training process. We find that domain variance leads to more significant view-noise in unsupervised data augmentation, affecting the further applications of contrastive learning~(CL), as well as the current closed-set classifier and open-set classifier causing the model to be overconfident in novel class discovery. To address the above two issues, we propose Soft-contrastive All-in-one Network~(SAN) for ODA and UNDA tasks. SAN includes a novel data-augmentation-based CL loss, which is used to improve the representational capability, and a more human-intuitive classifier, which is used to improve the new class discovery capability. The soft contrastive learning~(SCL) loss is used to weaken the adverse effects of the data-augmentation label noise problem, which is amplified in domain transfer. The All-in-One~(AIO) classifier overcomes the overconfidence problem of the current mainstream closed-set classifier and open-set classifier in a more human-intuitive way. The visualization results and ablation experiments demonstrate the importance of the two proposed innovations. Moreover, extensive experimental results on ODA and UNDA show that SAN has advantages over the existing state-of-the-art methods.
translated by 谷歌翻译
The miniaturization and mobility of computer vision systems are limited by the heavy computational burden and the size of optical lenses. Here, we propose to use a ultra-thin diffractive optical element to implement passive optical convolution. A division adjoint opto-electronic co-design method is also proposed. In our simulation experiments, the first few convolutional layers of the neural network can be replaced by optical convolution in a classification task on the CIFAR-10 dataset with no power consumption, while similar performance can be obtained.
translated by 谷歌翻译
机器人定位是使用地图和传感器测量结果找到机器人姿势的反问题。近年来,可逆神经网络(INNS)成功地解决了各个领域的模棱两可的反问题。本文提出了一个解决旅馆本地化问题的框架。我们设计了一个在逆路径中提供隐式映射表示形式的旅馆。通过对评估中的潜在空间进行采样,局部\ _inn输出机器人以协方差构成,可用于估计不确定性。我们表明,本地\ _inn的本地化性能与延迟较低的当前方法相当。我们使用训练集的外观显示了从本地\ _inn的详细的2D和3D地图重建。我们还使用本地\ _inn提供了全球本地化算法来解决绑架问题。
translated by 谷歌翻译